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Abstract 

This paper presents a quantum algorithm that computes the product of two nx n Boolean 
matrices in 0{n\fl+tyfn) time, where I is the number of non-zero entries in the product. This 
improves the previous output-sensitive quantum algorithms for Boolean matrix multiplication 
in the time complexity setting by Buhrman and Spalek (SODA'06) and Le Gall (SODA' 12). 
We also show that our approach cannot be further improved unless a breakthrough is made: 
we prove that any significant improvement would imply the existence of an algorithm based 
on quantum search that multiplies two n x n Boolean matrices in 0(n 5 / 2 ~ e ) time, for some 
constant e > 0. 

1 Introduction 

1.1 Background 

Multiplying two Boolean matrices, where addition is interpreted as a logical OR and multiplica- 
tion as a logical AND, is a fundamental problem that has found applications in many areas of 
computer science (for instance, computing the transitive closure of a graph [H [151 or solving 
all-pairs path problems (5][9j[T7l[TjD). The product of two n x n Boolean matrices can be trivially 
computed in time 0(n 3 ). The best known algorithm is obtained by seeing the input matrices as 
integer matrices, computing the product, and converting the product matrix to a Boolean matrix. 
Using the algorithm by Coppersmith and Winograd |H for multiplying integer matrices (and more 
generally for multiplying matrices over any ring), or its recent improvements by Stothers [ 19] and 
Vassilevska Williams [20], this gives a classical algorithm for Boolean matrix multiplication with 
time complexity 0(n 2 38 ). 

This algebraic approach has nevertheless many disadvantages, the main being that the huge 
constants involved in the complexities make these algorithms impractical. Indeed, in the classical 
setting, much attention has focused on algorithms that do not use reductions to matrix multiplica- 
tion over rings, but instead are based on search or on combinatorial arguments. Such algorithms 
are often called combinatorial algorithms, and the main open problem in this field is to understand 
whether a 0(n 3_e )-time combinatorial algorithm, for some constant e > 0, exists for Boolean ma- 
trix multiplication. Unfortunately, there has been little progress on this question. The best known 
combinatorial classical algorithm for Boolean matrix multiplication, by Bansal and Williams O, 
has time complexity 0(n 3 / log 2 25 (n)). 

In the quantum setting, there exists a straightforward 0(n 5 / 2 )-tima3 algorithm that computes 
the product of two n x n Boolean matrices A and B: for each pair of indexes i, j G {1, 2, . . . , n}, 

'in this paper the notation O suppresses poly (log n) factors. 
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check if there exists an index k G {1, . . . ,n} such that A[i, k] = B[k,j] = 1 in time 0(y/n) 
using Graver's quantum search algorithm ifTUl . Buhrman and Spalek [3] observed that a similar 
approach leads to a quantum algorithm that computes the product AB in 0(n 3 ^ 2 y/I) time, where 
I denotes the number of non-zero entries in AB. Since the parameter I G {0, . . . , n 2 } represents 
the sparsity of the output matrix, such an algorithm will be referred as output-sensitive. Classical 
output-sensitive algorithms for Boolean matrix multiplication have also been constructed recently: 
Amossen and Pagh [1] constructed an algorithm with time complexity 0(n L724 £ a408 + n 4 / 3 ^ 2 / 3 + 
n 2 ), while Lingas lfl4ll constructed an algorithm with time complexity O(n 2 £ 0188 ). The above 
O (re 3 / 2 \/^) -time quantum algorithm beats both of them when £ < re 1602 . Note that these two 
classical algorithms are based on the approach by Coppersmith and Winograd [4] and are thus not 
combinatorial. 

Le Gall lfT3l has recently shown that there exists an output-sensitive quantum algorithm that 
computes the product of two n x n Boolean matrices with time complexity 0(re 3//2 ) if 1 < £ < 
n I and 0(re£ 3 / 4 ) if 

n 2/3 < £ < n 2 jfa s algorithm, which improves the quantum algorithm by 
Buhrman and Spalek Q, was constructed by combining ideas from works by Vassilevska Williams 
and Williams [21 ] and Lingas [14]. 

Several developments concerning the quantum query complexity of this problem, where the 
complexity under consideration is the number of queries to the entries of the input matrices A and 
B, have also happened. Output-sensitive quantum algorithms for Boolean matrix multiplication 
in the query complexity setting were first proposed in (2TJ, and then improved in lPT3l . Very 
recently, Jeffery, Kothari and Magniez [11] significantly improved those results: they showed 
that the quantum query complexity of computing the product of two n x n Boolean matrices 
with £ non-zero entries is 0(ny/l), and gave a matching (up to polylogarithmic factors) lower 
bound £l(ny/I). The quantum query complexity of Boolean matrix multiplication may thus be 
considered as settled. 

Can the quantum time complexity of Boolean matrix multiplication can be further improved as 
well? The most fundamental question is of course whether there exists a quantum algorithm that 
uses only quantum search or similar techniques with time complexity 0(n 5//2_e ), for some con- 
stant e > 0, when £ re 2 . This question is especially motivated by its apparently deep connection 
to the design of subcubic-time classical combinatorial algorithms for Boolean matrix multiplica- 
tion: a 0(n 5 / 2 ~ e )-time quantum algorithm would correspond to an amortized cost of 0(n 1 / 2_£ ) 
per entry of the product, which may provides us with a new approach to develop a subcubic-time 
classical combinatorial algorithm, i.e., an algorithm with amortized cost of 0(n 1_e ) per entry of 
the product. Studying quantum algorithms for Boolean matrix multiplication in the time complex- 
ity setting can then, besides its own interest, be considered as a way to gain new insight about the 
optimal value of the exponent of matrix multiplication in the general case (i.e., for dense output 
matrices). In comparison, when the output matrix is dense, the classical and the quantum query 
complexities of matrix multiplication are both trivially equal to 0(n 2 ). 

1.2 Statement of our results 

In this paper we build on the recent approach by Jeffery, Kothari and Magniez ifTTl to construct 
a new time-efficient output-sensitive quantum algorithm for Boolean matrix multiplication. Our 
main result is stated in the following theorem. 

Theorem 1. There exists a quantum algorithm that computes the product of two n x n Boolean 
matrices with time complexity 0(n\fl + £\fn), where I denotes the number of non-zero entries in 
the product. 



2 



Buhrman-Spalek (SODA'06) 




Figure 1: The upper bounds on the time complexity of quantum algorithms for matrix multipli- 
cation given in Theorem [Q (in solid line). The horizontal axis represents the logarithm of £ with 
respect to basis n (i.e., the value log n (£)). The vertical axis represents the logarithm of the com- 
plexity with respect to basis n. The dashed line represents the upper bounds on the time complexity 
obtained in iTTBl . and the dotted line represents the upper bounds obtained in 0. 

The upper bounds of Theorem Q] are illustrated in Figure 1. Our algorithm improves the quan- 
tum algorithm by Le Gall lfl"3l for any value of £ other than f w n 2 (we obtain the same upper 
bound 0(n 2,5 ) for £ « n 2 ). It also beats the classical algorithms by Amossen and Pagh [1] and 
Lingas Ifl4l mentioned earlier, which are based on the algebraic approach, for any value £ < n 1847 
(i.e., whenever £y/n < n 2 £ 0A88 ). 

As will be explained in more details below, for £ < n our result can be seen as a time-efficient 
version of the quantum algorithm constructed for the query complexity setting in iPTTTl . The query 
complexity lower bound Q(n\ft) proved in [11] shows that the time complexity of our algorithm 
is optimal, up to a possible polylogarithmic factor, for £ < n. The most interesting part of our 
results is perhaps the upper bound 0(£y/n) we obtain for £ > n, which corresponds to the case 
where the output matrix is reasonably dense and differs from the query complexity upper bounds 
obtained in ifTTTl . We additionally show that, for values £ > n, no quantum algorithm based on 
search can perform better than ours unless there exists a quantum algorithm based on search that 
computes the product of two arbitrary n x n Boolean matrices with time complexity significantly 
better that n 5 / 2 . The formal statement follows. 

Theorem 2. Let 5 be any function such that 5(n) > Ofor all n 6 N + . Suppose that, for some 
value A > n, there exists a quantum algorithm Q that, given as input any n x n Boolean matrices 
A and B such that the number of non-zero entries in the product AB is at most A, computes AB 
in O (Xy/n ■ n"^™)) time. Then there exists an algorithm using Q as a black-box that computes 
the product of two n x n Boolean matrices with overall time complexity 0(n 5 / 2_<5 (™) + n). 

The reduction stated in Theorem[2]is actually classical and combinatorial: the whole algorithm 
uses only classical combinatorial operations and calls to Q. Thus Theorem [2] implies that, if for 
a given value £ > n the complexity of Theorem Q] can be improved to O (£y/n/n e ), for some 
constant e > 0, using techniques similar to ours (i.e., based on quantum search), then there exists 
an algorithm based on quantum search (and classical combinatorial operations) that computes the 
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product of two n x n Boolean matrices with time complexity 0(n 5 / 2 6 + n). 
1.3 Overview of our techniques 

The main tool used to obtain our improvements is the new approach by Jeffery, Kothari and Mag- 
niez Ull to find collisions in the graph associated with the multiplication of two n x n Boolean 
matrices. More precisely, it was shown in [11] how to find up to t collisions in this graph, on 
a quantum computer, using 0(y/nt + y£) queries, where i is the number of non-zero entries 
in the product. We construct (in Section [3]) a time-efficient version of this algorithm that finds 
one collision in 0(y/n + \fl) time. We then use this algorithm to design a quantum algorithm 
that computes the matrix product in time O(nyl) when I = 0(n), which proves Theorem Q] for 

1 = 0(n). Our key technique is the introduction of a small data structure that is still powerful 
enough to enable time-efficient access to exactly all the information about the graph needed by the 
quantum searches. More precisely, while the size of the graph considered is 0(n 2 ), we show that 
the size of this data structure can be kept much smaller — roughly speaking, the idea is to keep 
a record of the non-edges of the graph. Moreover, the data structure is carefully chosen so that 
constructing it, at the beginning of the algorithm, can be done very efficiently (in 0{n) time), and 
updating it during the execution of the algorithm can be done at a cost less than the running time 
of the quantum searches. 

We then prove that the ability of finding up to n non-zero entries of the matrix product is 
enough by showing (in Section|4]) a classical reduction, for I > n, from the problem of computing 
the product of two n x n Boolean matrices with at most I non-zero entries in the product to 
the problem of computing l/n separate products of two Boolean matrices, each product having 
at most 0(n) non-zero entries. The idea is to randomly permute the rows and columns of the 
input matrices in order to make the output matrix homogeneous (in the sense that the non-zero 
entries are distributed almost uniformly), in which case we can decompose the input matrices into 
smaller blocks and ensure that each product of two smaller blocks contains, with non-negligible 
probability, at most 0(n) non-zero entries. This approach is inspired by a technique introduced 
by Lingas Ifl4l and then generalized in Ifl2l [T3ll . The main difference is that here we focus on 
the number of non-zero entries in the product of each pair of blocks, while Ifl2l [T3l [T4l focused 
mainly on the size of the blocks. The upper bounds of Theorem [Q for I > n follow directly from 
our reduction, and a stronger version of this reduction leads to the proof of Theorem [2 

2 Preliminaries 

In this paper we suppose that the reader is familiar with quantum computation, and especially with 
quantum search and its variants. We present below the model we are considering for accessing the 
input matrices on a quantum computer, and computing their product. This model is the same as 
the one used in ll3l[T3l. 

Let A and B be two n x n Boolean matrices, for any positive integer n (the model presented 
below can be generalized to deal with rectangular matrices in a straightforward way). We suppose 
that these matrices can be accessed directly by a quantum algorithm. More precisely, we have 
an oracle Oa that, for any i, j G {1, . . . , n}, any a G {0, 1} and any z G {0, 1}*, performs the 
unitary mapping Oa- \i)\j)\a)\z) i— > \i)\j)\a © ^4[i,j])|z), where © denotes the bit parity (i.e., 
the logical XOR). We have a similar oracle Ob for B. Since we are interested in time complexity, 
we will count all the computational steps of the algorithm and assign a cost of one for each call to 
Oa or Ob, which corresponds to the cases where quantum access to the inputs A and B can be 
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done at unit cost, for example in a random access model working in quantum superposition (we 
refer to iPToll for an extensive treatment of such quantum random access memories). 

Let C = AB denote the product of the two matrices A and B. Given any indices i,j G 
{1, . . . , n} such that C[i,j] = 1, a witness for this non-zero entry is defined as an index k G 
{1, ... ,71} such that A[i, k] = B[k,j] = 1. We define a quantum algorithm for Boolean matrix 
multiplication as follows. 

Definition 2.1. A quantum algorithm for Boolean matrix multiplication is a quantum algorithm 
that, when given access to oracles Oa and Ob corresponding to Boolean matrices A and B, 
outputs with probability at least 2/3 all the non-zero entries of the product AB along with one 
witness for each non-zero entry. 

The complexity of several algorithms in this paper will be stated using an upper bound A on 
the number I of non-zero entries in the product AB. The same complexity, up to a logarithmic 
factor, can actually be obtained even if no nontrivial upper bound on I is known a priori. The idea 
is, similarly to what was done in lf2Tl [T3l . to try successively A = 2 (and find up to 2 non-zero 
entries), A = 4 (and find up to 4 non-zero entries), . . . and stop when no new non-zero entry is 
found. The complexity of this approach is, up to a logarithmic factor, the complexity of the last 
iteration (in which the value of A is A = 2^ og 2^ +1 if £ is a power of two, and A = 2 T lo S2 ^1 
otherwise). In this paper we will then assume, without loss of generality, that a value A such that 
(■ < A < 21 is always available. 

3 Finding up to 0(n) Non-zero Entries 

Let A and B be the two n x n Boolean matrices of which we want to compute the product. In this 
section we define, following ifTTl . a graph collision problem and use it to show how to compute up 
to 0(n) non-zero entries of AB. 

Let G = (I, J, E) be a bipartite undirected graph over two disjoint sets / and J, each of size n. 
The edge set E is then a subset of / x J. When there is no ambiguity it will be convenient to write 
I = {1, ... ,11} and J = {1, . . . , n}. We now define the concept of a collision for the graph G. 

Definition 3.1. For any index k G {1, . . . , n}, a k-collision for G is an edge (i, j) G E such that 
A[i, k] = B[k,j] = 1. A collision for G is an edge G E that is a k-collision for some index 
k g {l,...,n}. 

We suppose that the graph G is given by a data structure ^# that contains the following infor- 
mation: 

• for each vertex u in I, the degree of u; 

• for each vertex u in I, a list of all the vertices of J not connected to u. 

The size of ^# is at most 0(n 2 ), but the key idea is that its size will be much smaller when G 
is "close to" a complete bipartite graph. Using adequate data structures to implement ^# (e.g., 
using self-balancing binary search trees), we can perform the following four access operations in 
poly(log n) time. 

get-degree (u) : get the degree of a vertex u G I 

check-connection(u, v) : check if the vertices u G I and v G J are connected 
get -vert/ (r, d) : get the r-th smallest vertex in I that has degree at most d 
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get-vert j (r, u) : get the r-th smallest vertex in J not connected to u G I 

For the latter two access operations, the order on the vertices refer to the usual order < obtained 
when seeing vertices in / and J as integers in {1, . . . ,n}. We assume that these two access 
operations output an error message when the query is not well-defined (i.e, when r is too large). 

Similarly, we can update in poly(log n) time to take in consideration the removal of one 
edge (u, v) from E (i.e., update the degree of u and update the list of vertices not connected 
to u). This low complexity will be crucial since our main algorithm (in Proposition 13.21 below) 
will remove successively edges from E. 

Let L be an integer such that < L < n 2 . We will define our graph collision problem, 
denoted Graph COLLlSlON(n,L), as the problem of finding a collision for G under the promise 
that \E\ > n 2 — L, i.e., there are at most L missing edges in G. The formal definition is as follows. 

Graph Collision^, L) [ here n > 1 and < L < n 2 ] 
INPUT: two n x n Boolean matrices A and B 

a bipartite graph G = (I U J, E), with |/| = \J\= n, given by 

an index k G {1, . . . , n} 
PROMISE: \E\ >n 2 - L 

OUTPUT: one /c-collision if such a collision exists 

The following proposition shows that there exists a time-efficient quantum algorithm solving 
this problem. The algorithm is similar to the query-efficient quantum algorithm given in |11], but 
uses the data structure ^ in order to keep the time complexity low. 

Proposition 3.1. There exists a quantum algorithm running in time 0(\/L+ ^/n) that solves, with 
high probability, the problem GRAPH COLLISION(n,L). 

Proof. We will say that a vertex i G / is marked if A[i, k] = 1, and that a vertex j G J is marked 
if B[k, j] = 1. Our goal is thus to find a pair (i, j) G E of marked vertices. The algorithm is as 
follows. 

We first use the minimum finding quantum algorithm from [ 6 ] to find the marked vertex u of 
largest degree in /, in 0{^/n) time using get-degree (•) to obtain the order of a vertex from the 
data structure Let d denote the degree of u, let /' denote the set of vertices in / with degree 
at most d, and let S denote the set of vertices in J connected to u. We then search for one marked 
vertex in S, using Graver's algorithm [10] with check-connection(u, •), in 0(y/n) time. If 
we find one, then this gives us a ^-collision and we end the algorithm. Otherwise we proceed as 
follows. Note that, since each vertex in /' has at most d neighbors, by considering the number of 
missing edges we obtain: 

• (n- d) < n 2 - \E\ < L. 

Also note that \J\S\ = n — d. We do a quantum search on I' x (J\S) to find one pair of 
connected marked vertices in time 0(^\I'\ ■ \ J\S\) = 0(\/Z), using get-vert/ (•, d) to access 
the vertices in I' and get-vert u) to access the vertices in J\S. □ 

We now show how an efficient quantum algorithm that computes up to 0(n) non-zero entries 
of the product of two n x n matrices can be constructed using Proposition 13. II 

Proposition 3.2. Let X be a known value such that A = 0(n). Then there exists a quantum 
algorithm that, given any n x n Boolean matrices A and B such that the number of non-zero 
entries in the product AB is at most X, computes AB in time 0{n^fX). 
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Proof. Let A and B be two n x n Boolean matrices such that the product AB has at most A 
non-zero entries. 

We associate with this matrix multiplication the bipartite graph G = (V,E), where V = JU J 
with I = J = {1, . . . ,n}, and define the edge set as E = I x J. The two components / and J of 
G are then fully connected: there is no missing edge. It is easy to see that computing the product 
of A and B is equivalent to computing all the collisions, since a pair {i, j) is a collision if and only 
if the entry in the i-th row and the j-th column of the product AB is 1. 

To find all the collisions, we will basically repeat the following approach: for a given k, 
search for a new ^-collision in G and remove the corresponding edge from E by updating the data 
structure ^ corresponding to G. Since we know that there are at most A non-zero entries in the 
matrix product AB, at most A collisions will be found (and then removed). We are thus precisely 
interested in finding collisions when \E\ > n 2 — A, i.e., when there are at most A missing edges in 
G. We can then use the algorithm of Proposition 13.11 The main subtlety is that we cannot simply 
try all the indexes k successively since the cost would be too high. Instead, we will search for 
good indexes in a quantum way, as described in the next paragraph. 

We partition the set of potential witnesses K = {l,...,n} into m = max(A,n) subsets 
K\, . . . , K m , each of size at most \n/m\. Starting with s = 1, we repeatedly search for a 
pair (i, j) that is a ^-collision for some k G K s . This is done by doing a Grover search over 
K s that invokes the algorithm of Proposition 13. II Each time a new collision (i, j) is found (which 
is a ^-collision for some k G K s ), we immediately remove the edge from E by updating 
the data structure M. When no other collision is found, we move to K s+ \. We end the algorithm 
when the last set K m has been processed. 

This algorithm will find, with high probability, all the collisions in the initial graph, and thus 
all the non-zero entries of AB. Let us examine its time complexity. We first discuss the complexity 
of creating the data structure ^£ (remember that updating ^ to take in consideration the removal 
of one edge from E has polylogarithmic cost). Initially \E\ = n 2 , so each vertex of I has the same 
degree n. Moreover, for each vertex u G I, there is no vertex in J not connected to u. The cost 
for creating ^# is thus 0{n) time. Next, we discuss the cost of the quantum search. Let X s denote 
the number of collisions found when examining the set K s . Note that the search for collisions (the 
Grover search that invokes the algorithm of Proposition 13.11 ) is done X s + 1 times when examining 
K s (we need one additional search to decide that there is no other collision). Moreover, we have 
Ai -) + X m < X. The time complexity of the search is thus 

(m \ 
VWs\ x (V^A + y/n) x (A s + 1) =6 (^A + nVx^j = O (nVx^j . 

The overall time complexity of the algorithm is thus 0{n\TX + n) = 0(n\/A)- □ 

4 Reduction to Several Matrix Multiplications 

Suppose that we have a randomized (or quantum) algorithm stf that, given any m x n Boolean 
matrix A and any nxm Boolean matrix B such that the number of non-zero entries in the product 
AB is known to be at most L, computes AB with time complexity T(m,n, L). For the sake of 
simplicity, we will make the following assumptions on : 

(1) the time complexity of £/ does not exceed T(m, n, L) even if the input matrices do not 
satisfy the promise (i.e., if there are more than L non-zero entries in the product); 

(2) the algorithm stf never outputs that a zero entry of the product is non-zero; 
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(3) if the matrix product has at most L non-zero entries, then with probability at least 1 — 1/n 3 
all these entries are found. 

These assumptions can be done without loss of generality when considering quantum algorithms 
for Boolean matrix multiplication as defined in Section[2] Assumption (1) can be guaranteed sim- 
ply by supposing that the algorithm systematically stops after T(m, n, L) steps. Assumption (2) 
can be guaranteed since a witness is output for each potential non-zero entry found (the witness 
can be used to immediately check the result). Assumption (3) can be guaranted by repeating the 
original algorithm (which has success probability at least 2/3) a logarithmic number of times. 
The goal of this section is to show the following proposition. 

Proposition 4.1. Let L be a known value such that L > n. Then, for any value r G {1, . . . , n}, 

there exists an algorithm that, given any n x n Boolean matrices A and B such that the number 
of non-zero entries in the product AB is at most L, uses algorithm g/ to compute with high 
probability the product AB in time 



We will need a lemma in order to prove Proposition 14.11 Let C be an n x n Boolean matrix 
with at most L non-zero entries. Let r be a positive integer such that r < n. We choose an 
arbitrary partition Pi of the set of rows of C into r blocks in which each block has size at most 
\n/r\. Similarly, we choose an arbitrary partition P2 of the set of columns of C into r blocks in 
which each block has size at most \n/r~\. These gives a decomposition of the matrix C into r 2 
subarrays (each of size at most \n/r \ x \n/r~\ ). We would like to say that, since C has L non-zero 
entries, then each subarray has at most 0(L/r 2 ) non-zero entries. This is of course not true in 
general, but a similar statement will hold with high probability for a given subarray if we permute 
the rows and the columns of C randomly. We formalize this idea in the following lemma, which 
can be seen as an extension of a result proved by Lingas (Lemma 2 in flU ). 

Lemma 4.1. Let C be an n x n Boolean matrix with at most L non-zero entries. Assume that a 
and t are two permutations of the set {1, . . . , n} chosen independently uniformly at random. Let 
C[i,j] be any non-zero entry of C. Then, with probability at least 9/10, after permuting the rows 
according to a and the columns according to t, the subarray containing this non-zero entry (i.e., 
the subarray containing the entry in the a(i)-th row and the r(j)-th column) has at most 



non-zero entries. 

Proof. Since the permutations a and r are chosen independently uniformly at random, we can 
consider that the values a(i) and r(j) are first chosen, and that only after this choice the 2n — 2 
other values (i.e., cr(i r ) for i' 7^ i and r(j') for j' ^ j) are chosen. 

Assume that the values a(i) and r(j) have been chosen. Consider the subarray of C containing 
the entry in the a(i)-th row and the r(j)-th column. Let S denote the set of all the entries of the 
subarray, and let T C S denote the set of all the entries of the subarray that are in the a(i)-th 
row or in the r(j)-th column. Note that \S\ < \n/r~\ 2 and \T\ < 2\n/r] — 1. Let X s , for each 
s £ S, denote the random variable with value one if the entry s of the subarray is one, and value 
zero otherwise. The random variable representing the number of non-zero entries in the subarray 
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is thus Y = J2seS -%s- The expectation of Y is 

E[Y]=J2e[Xs}= E E[X.] + Y,E[X.] < \S\T\ x + \T\ 



ses seS\T seT 



The first inequality is obtained by using the inequality E[X S ] < 1 for each entry s G T, and by 
noting that each of the non-zero entries of C that are neither in the i-th row nor in the j-th column 
has probability exactly -, } u2 to be moved into a given entry in S\T by the permutations of the 
remaining n — 1 rows and n — 1 columns. From Markov's inequality we obtain: 



Pr 



^>io-(f^ + 2KH-i 



1 

< — . 
- 10 



The statement of the lemma follows from the observation that the above inequality holds for 
any choice of a(i) and r(j). □ 

Proof of Proposition \4. 1 1 Take two arbitrary partitions Pi, P2 of the set {1, . . . , n} into r blocks 
in which each block has size at most \n/r~\. Let us write 

(L\nlr\ 2 r , n 

It is easy to show that A < 10 °( ra + Z/ / r ') wn en n > 3. We will repeat the following procedure 
[clogn] times, for some large enough constant c: 

1. Permute the rows of A randomly and denote by A* the resulting matrix; 
Permute the columns of B randomly and denote by B* the resulting matrix; 

2. Decompose A* into r smaller matrices A\, . . . , A* of size at most \n/r~\ x n by partitioning 
the rows of A* according to Pi ; 

Decompose B* into r smaller matrices B*, . . . , B* of size at most n x \n/r~\ by partitioning 
the columns of B* according to P2; 

3. For each s G {1, . . . , r} and each i G {1, . . . , r}, compute up to A non-zero entries of the 
product A*Bf using the algorithm £/. 

The time complexity of this procedure is O (r 2 x T (\n/r\,n, A) + n), where the additive 
term O(n) represents the time complexity of dealing with the permutations of rows and columns 
(note that A*, B*, the A*'s and the P f *'s have not to be computed explicitly; we only need to be 
able to recover in polylogarithmic time a given entry of these matrices). 

We now show the correctness of the algorithm. First, the algorithm will never output that a 
zero entry of AB is non-zero, from our assumption on the algorithm srf . Thus all the entries output 
by the algorithm are non-zero entries of AB. The question is whether all the non-zero entries are 
output. 

Let be a fixed non-zero entry of AB. Note that each matrix product A*B£ corresponds 
to a subarray of A*B*. From our assumptions on algorithm si ', this entry will be output with 
probability p > 1 — 1/n 3 at Step 3 of the procedure if the entry (after permutation of the rows 
and the columns) is in a subarray of A*B* containing at most A non-zero entries. From Lemma 
|4T] this happens with probability at least 9/10. With probability at least 1 - (1/10) ^ c ^s^l this 
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case will happen at least once during the [clogra] iterations of the procedure. By choosing the 
constant c large enough, we have 1 — (1/10) T c lo s"l > 1 — 1/n 3 , and then the algorithm outputs 
this non-zero entry with probability at least (1 — l/n 3 )p > 1 — 2/n 3 . By the union bound we 
conclude that the probability that the algorithm outputs all the non-zero entries of AB is at least 
1-2/n. □ 



5 Proofs of Theorems 3] and [2] 

In this section we give the proofs of Theorems Q] and |2] 

Proof of Theorem^ Let A and B be two n x n Boolean matrices such that the product AB has I 
non-zero entries. Remember that, as discussed in Section[2l an integer AG {1, . . . , n 2 } such that 
t < X < 21 is known. 

If A < n then the product AB can be computed in time 0{n\f\) = 0{n^fi) by the algorithm 
of Proposition 13.21 Now consider the case n < A < n 2 . By Proposition 14. 1 1 (with the value r = 
\\f\/n\), the product of A and B can be computed with complexity O X T (n, n, A) + n) , 
where A = 0(n). Combined with Proposition 13.21 this gives a quantum algorithm that computes 
the product AB in O (A-y/n) = O (£y/n) time. □ 

Proof of Theorem^ Suppose the existence of a quantum algorithm that computes in time 

O (xVn-n- s(n A 

the product of any two n x n Boolean matrices such that the number of non-zero entries in their 
product is at most A. Let c be a positive constant. Using Proposition 14.11 with the values r = 
[cn/\/Al and L = n 2 , we obtain a quantum algorithm that computes the product of two n x n 
Boolean matrices in time 

~ (n 2 / lOOn lOOn 2 \ 
O — x T n.n, ^ H =— + n 

By choosing the constant c large enough, we can rewrite this upper bound as 

O^yxT (n, n, A) + n) = 6 (n b ' 2 - 5 ^ + n) , 

which concludes the proof of the theorem. □ 
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